UniqueProt: creating representative protein sequence sets

نویسندگان

چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

UniqueProt: creating representative protein sequence sets

UniqueProt is a practical and easy to use web service designed to create representative, unbiased data sets of protein sequences. The largest possible representative sets are found through a simple greedy algorithm using the HSSP-value to establish sequence similarity. UniqueProt is not a real clustering program in the sense that the 'representatives' are not at the centres of well-defined clus...

متن کامل

Selection of representative protein data sets.

The Protein Data Bank currently contains about 600 data sets of three-dimensional protein coordinates determined by X-ray crystallography or NMR. There is considerable redundancy in the data base, as many protein pairs are identical or very similar in sequence. However, statistical analyses of protein sequence-structure relations require nonredundant data. We have developed two algorithms to ex...

متن کامل

Representative Protein Sequence and Structure Database

The database provides the information about the non-redundant protein dataset (1573 proteins) obtained from the Protein Data Bank. The information includes PDB ID, Length of the protein, Resolution, PDB Secondary structure, PDB secondary structure summary, PHD secondary structure prediction, PHD secondary structure prediction summary, sequence. We further revised the PDB Secondary structure sum...

متن کامل

Selecting Representative Data Sets

A training set is a special set of labeled data providing known information that is used in the supervised learning to build a classification or regression model. We can imagine each training instance as a feature vector together with an appropriate output value (label, class identifier). A supervised learning algorithm deduces a classification or regression function from the given training set...

متن کامل

RSDB: representative protein sequence databases have high information content

MOTIVATION Biological sequence databases are highly redundant for two main reasons: 1. various databanks keep redundant sequences with many identical and nearly identical sequences 2. natural sequences often have high sequence identities due to gene duplication. We wanted to know how many sequences can be removed before the databases start losing homology information. Can a database of sequence...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Nucleic Acids Research

سال: 2003

ISSN: 1362-4962

DOI: 10.1093/nar/gkg620